DOCDHEMT RESOHE 



ED 084 200 



SO 006 619 



TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FBOH 



EDPS PfilCE 
DESCRIPTORS 



IDENTIFIERS 



Evaluation Conient, The Journal of Educational 
Evaluation. Volune 3, Number 4. 

California Univ., Los Angeles, Center for the Study 
of Evaluation. 
Dec 72 
8p* 

Center for the Study of Evaluation, 145 Hoore Hall, 
University of California, Los Angeles, Los Angeles, 
California 9002U (Issued Quarte rly; Free to 
professional educators) 

MF-$0*65 HC-$3.29 

Course Evaluation ; Curriculuo Evaluation^ 
♦Educational Objectives; Educational Theories; 
Evaluation; ♦Evaluation C riteria; ♦Evaluation 
Methods; Fornative Evaluation ; ♦Newsletters ; 
Opinions; Suamative Evaluation; ♦Theoretical 
Criticism 

GFE; ♦Goal Free Evaluation 



ABSTRACT 

Eac 

to professional edu 
by presenting artic 
sethodologies, or p 
newsletter is an ex 
Scriven discusses t 
suB^ative evaluatio 
effects, observes f 
presents aethodolog 
education. Daniel L 
develops four guest 
■erit of GFE. Marvi 
but that they are v 
of a pcograa. Ja 
enphasizes results 
to educators who ar 
George P. Kneller a 
finds the issue one 



h quarterly issue of this jo 
cators, discusses topics in 
les on evaluation theory, pr 
ractices. The topic of the s 
aaination of free evaluation 
he role of goal free evaluat 
n, especially in the evaluat 
avorable considerations of t 
ical analogies of GFE in fie 
• StufflebeaB criticizes scr 
iocs he feels to be ispottan 
n C. Alkin writes that GFE d 
ider-context goals rather th 
mes Pophas proposes that the 
rather than rhetoric and pro 
e overly enanored of instruc 
rgues with the logic of Scri 
of taste rather than of the 



urnal, available free 
educational evaluation 
ocedures, 

ix articles in this 

(GFE) • Michael 
ion in foraative and 
ion of unintended 
his oethod, and 
Ids other than 
iven's position and 
t in assessing the 
oes recognize goals, 
an specific objectives 

GFE concept 
vides a useful caution 
tional objectives, 
ven's argument and 
ory. (KS19) 



The Journal 
Educational Evaluation 



O 
CD 

CO 

o 



us OEPARTMENTOF HCALTH, 
EDUCATION & WELFARE 
NATIONAL INSTITUTE OF 
eOUCATION 

1HI=. DOCUMENT HAS ^^EFN MEPRO 
OUCLO EXACTLY AS RECRIVEO f ROM 
THE PERSON OR ORGANIZATION ORIGIN 
ATtNGiT POlNTSOf V I E W OH OP I N I ONS 
STATED no NOT NECESSARILY RFPRE 
SENT Of- Mr,AL N^TIONAl. INSTITUTE OF 
EDUCATION POSITION 0« POLICY 



Decieiriber 1972 Vol. 3, No. 4 




UJ 



0^ 



Center for the Study of Evaluation 
UCLA 



PROSE AND CONS ABOUT GOAL-FREE EVALUATION 

Michael Scriven 



Introduction 

In tho winter of 1970-71, tlin National CtMitiir for Educa- 
tional Conimunicnlions of USOH askod F/I'S to (ivaluato Iht? 
clisscminnhlc products of tlio rnj^ional labs and RiVcD cjmiUm'S. 
Tlin rnward for siinness was to hn substantial j^rants to as 
sist dissoniination. ETS snt up an ijxturnal coniniiltcMi to do 
the evaluation, undnr the rhairmanship of David Krathwobl, 
and provided very extensive and excellent staff support for 
what had to be a rather rapid review. In order to standard- 
ize the practice as well as the jiroducts of the committee 
(on which 1 served) I b(^gan to de\'(dop a standard form to 
serve as a check list for us and, when filled out, as a sum- 
mary for ETS and NCEC. There were originally about 70 
entries in what became known as the Product Evaluation 
Pool, and they ranjjed from toys for pre-schoolers throuj^h 
publications on teacher traininjj; and bilinj^ual curricula, to 
vast new systems for managint^ schools. On these, we had 
varying amounts of data about field trials, mostly very thin, 
we had the write-ups by th^^ producin.^ staff and other 
observers, and we had the products themselves. Other 
input was the list of current IJSOE priorities in education.. 

it seemed very natural to start off the evaluation form 
with a rating of goals of the project and to go oil with a 
rating of the effectiveness in metUing them, costs, etc. By 
the sixth draft of the form, another item had become \'ery 
prominent, namely side-effects. Naturally, these had also 
to be rated, and in one case a product finished .iip^'in|thtv 
Top Ten in spite of zero results with respect to its intended 
outcomes because it did so well on an unanticipnted effect. 



r- writing on goal*fiee evaluation 



IMicbael Scriven is a Professor of Philosophy at the' 
' University of Califdnilar Berkeley 

Daniel I. Stofflebeam is theDirector of the Evaluation 
Center and Professor o£ Education at 
> V ' Ohio Slate University ^ - ^ ^ 

Marvin CAIldn is the Directoi^of^the Center for the ^ ' 
^ ^ ^ Study p{ Evaluation and Associate 
. ^ ' Professt)!!: o^Education at UCLA 





Geoig^f^^KneUer^isane 

- .^-v V'^'^ Professor of Education at UCLA-i ''^ ^ 



ERIC 



s S 



Intended and unintended effects — why distinguish? 

Reflecting on this (JxperienciJ later. I became incr(iasingly 
uneasy about the separation of goals and sido-cjffecis. After 
all, we wn:ren't there tc) evaluate gcKils as such — that would 
bti an important part of an (»valuation of a /)roposa/, but not 
(1 began tc) think) of a /)ro(/[](:t. All that should be concern- 
ing us, sundy, was deti^rmining exactly wdiat effects this 
product had (or most likely had), and evaluating those, 
whetluM' ()r not they were intfuuled. 

In fact, it was obvious that the rhetoric of the original 
jirojiosal which had led to a particular product was fre- 
quently put forward as if it somehow constituted support- 
ing tividtuice for the oxcellenctJ of the product. This rhetoric 
was often couched in terms of the "in" phrases of five- 
year-old educational fads, sometimes given a swift up- 
dating with references to the current jargons or lists of 
educational priorities, ^'hat is. the rh(Jtoric of intent was 
being used as a substitute for evidence of success. Was it 
affecting us? It would be hard to prove it didn't. And it 
contributed nothing, since we were not supposed to be 
rewarding good intentions. 

Fiirthermore, the wdiole /ongucigo of "side-effect" or "sec- 
ondary (effect" or eveui "unanticipated liffect" (the terms 
were thon used as ajijiroximati^ synonyms) tended to be a 
put-down of what might well hv. the crucial achievement, 
especially in terms of ik^w priorities. Worse, it tended to 
make one look less hard for such effects in the data and to 
cleniand less evidence about tbcm — whicb is extremely un- 
satisfactory with respect to the many potentially very 
harmful side-eff(u:ts that ha\'e turned up over the years. 

It se(;med to me, in short, that consideration and evalua- 
tion of goals was an uiiiujcessary but also a possibly con- 
taminating step. 1 began to work on an alternative approach 
— simply, the evaluation of ochuj/ effects against (typically) 
a jirofile of (/emnns(r(jf(?(/ needs in this region of education. 
(This is close to what Consumers' Union actually does.) 
I call this Goal-Free Evaluation (GFE), 

Goal-fioe formative evaluation 

At first, it seemed that the proper jilace for goal-free 
^ivaluation (GFE) was in the sunnnative role, like the NCEC 
activity. In the formative situation, the evaluator's principal 
task must surely be telling the producer whether the proj- 
ect's goals were being mel. 

But the matter is not so simjile. A crucial function of good 
formative evaluation is to give the jjroducer a preview of 
the summative evaluation, Of course, a producer has made 
the l)et that if the goals of the project are achieved, the 
summative evaluation will be or should be favorable. But 
one can scarcely guarantee the non-occurrence of undesira- 
ble side-effects— and one should not overlook the possibility 



FILMED FROM BEST AVAILABLE COPY 



4 



of desirable ones that can be cultivatod with somo care and 
attention in later developmental cycles. Now, who is going 
to give the producer a sneak preview of sumniative results? 
The staff evaluator will try, and often can do a very good 
job. But that role is not conducive to objectivity — not only 
is it dependent on the payroll (and hence one where criti- 
cism can produce resentments with which the evaluator 
will have to live), but it is also very quickly tied in to the 
production activity. Typically, the staff evaliintors are the 
actual authors of most of the tests in curriculum products, 
and responsible for some of the form and content of much 
of the rest. Finally, the staff person is likely to have occupa- 
tional tunnel-vision with respect to the effects of the 
materials (or methods, etc.,)-thal is, a tendency to look 
mainly in the direction of the announced goals. 

Hence, it now seems to me that a producer or staff evalu- 
ator who wants good formntivr? evaluation has got to use 
some external evaluators to get it. Using them does not 
render the staff evaluator redundant; on the contrary, 
implementation or correction of the external evaluation 
depends in large part on the staff person. Psychologically, 
the staff evaluator may find it priceless to have support 
from an external source for some personal— and previously 
unshared— worries or complaints. Now. what I have said so 
far supports a practice of many producers in usinj? external 
evaluators. But what I have said also implies-becaiise it 
springs from the hunt for objectivity/independence - the 
desirability of arranging goal-free conditions for the exter- 
nal evaluator. 

As summative evaluation becomes increasingly goal-free 
—and I believe it will— the formative evaluation must do so 
to preserve the simulation. But forget that point; the same 
conclusion is forced on us by interest in picking up what 
are for the producer "side-effects.** The less the external 
evaluator hears about the goals of the project, the less tun- 
nel-vision will develop, the more attention will be paid to 
looking for QQiim\ effects (rather than checking on rt//eged 
effects). 

Other favorable considerations 

Look at the effects of considering goals on those who 
formulate them. It is likely to seem to them that it will pay 
better to err in the direction of grandiose goals rather than 
modest ones — as one can see from experience in reading 
proposals requesting funds, where it*s entirely appropriate 
to evaluate goals. This strategy assumes that a gallant try 
at Everest will be perceived more favorably than successful 
mounting of molehills. That may or may not be so, but it*s 
an unnecessary noise source for the evaluator. 

The alleged goals are often very different from the real 
goals. Why should the evaluator get into the messy job of 
trying to disentangle that knot? 

The goals are often stated so vaguely as to cover both 
desirable and undesirable activities, by almost anyone*s 
standards. Why try to find out what was really intended — 
if anything? (Similarly, the stated goals often conflict- 
why try to decide which one should supervene.) 

A trickier point. The identification of "side-effects" with 
"unanticipated effects" is a mistake. Goals are only a sub- 
set of anticipated effects; they are the ones of special 
importance, or the ones distinctive of this project. (For 
example, the goals of a new math curriculum project do not 
usually include "employing a secretary to type up corrected 
copy," but of course that effect is anticipated.) Hence, 
"side-effects" includes more phenomena than "unantici- 
pated effects," and some of the ones it alone includes may 
be important. In short, evaluation with respect to goals does 
not even include all the anticipated effects and gives much 
too limited a p/ )file of the project. Why get into the busi- 
ness of trying to make distinctions like this? 
" Zvtx\Mtx\k^i\ Comment— Page 2 



Since almost all projects either fall short of their goals or 
over-achieve them, why waste time rating the goals; which 
usually an)n*t what is achieved? 

GFE is unaffected by — and hence does not legislate 
against— the shifting of goals midway in a project. Given 
the amount of resentment caused by evaluation designs that 
require rigidity of the treatment throughout, this is an 
important benefit. But it*s a real advantage only to the 
extent that the project remains within the much larger but 
still finite ballpark the GFE has carved out of the jungle of 
possible effects. 

Unfavorable considerations— methodological and practical 

These are usually an amalgam of criticisms from various 
sources, sometimes real quotes. 

"The GFE'r simply substitutes his own goals for those of 
the project.'* No. The GFE may use lJSOE*s goals, or what 
the best evidence identifies as the needs of the nation, as 
standards; but simply to use his (or her) own personal 
preferences would obviously be to invalidate the evalua- 
tion. One needs standards of merit for an evaluation, in- 
deed; the error is to think these have to be the goals of the 
evaluator or the evaluated. Another, commonly cminected, 
error is to think that all standards of merit are arbitrary or 
subjective. There*^ nothing subjective about the claim that 
we need a cure for cancer more than a new brand of soap. 
The fact that some people have the opposite preference (if 
true) doesn*t even weakly undermine the claim about which 
of these alternatives the nation needs most. So the GFE 
may use needs and not goals, or the goals of the consumer 
or the funding agency. Which of these is appropriate de- 
pends on the case. But in no case is it proper to use anyone's 
goals as the standard unless they can be shown to be the 
appropriate ones and morally defensible. 

"Great idea— but hopelessly impractical. You can never 
keep the evaluator from inferring the goals of the project." 
This is certainly false; I and others have done evaluations 
where only the feeblest guesses would be possible, and of 
no great interest. If you control the data going to the evalua- 
tor, you can obviously reduce it to the point where goals 
are not inferable. And interesting— not exhaustive— evalua- 
tions are still possible. An evaluator with considerable 
experience of goal-based evaluation does indeed find it 
tempting, in fact almost neurotically necessary, to reach 
for the security blanket of goals. But once one learns to do 
without it, then, like riding a bicycle or swimming without 
the aids one uses at first, there is a remarkable sense of 
freedom, of liberation. 

"Why use an evaluator who only gets part of the data — 
you simply increase the chance that some of the most 
important effects (which happen to have been intended) 
will be missed?" Yes, this is the trade-off. The value of 
GFE does not lie in picking up what everyone already 
"knows,'* but in noticing something that everyone else has 
overlooked, or in producing a novel overall perspective. Of 
course, when summative time comes around, the intended 
effects had better be large enough to be obvious to the 
unaided (but expert) eye or, in general, they aren't worth 
very much. (The same is therefore true to a lesser extent 
for formative evaluation.) 

"Attacking the emphasis on careful goal-formulation 
approaches can only lead to poor planning, a catch-as-catch- 
can approach, and general carelessness— which you are 
giving intellectual sanction." Planning and production re- 
quire goals, and formulating them in testable terms is abso- 
lutely necessary for the manager as well as the internal 
evaluator who keeps the manager informed. That has noth- 
ing to do with the question of whether the external evalua- 
tor needs or should be given any account of the project*s 
goals. 



**I still can't see how CFE is supiTosod tn work In prnctico. 
You cnn'l test for all possihh; effects, nnd it's surely nhsurd 
to think you shouldn't uven hothnr with testinj» the real 
goals/' The external evaluator is not thorn to tost potds, hut 
rather to evaluate nchievemenl which turns out to bo con- 
ceptually distinct — and often different in practice,; too. As 
to the idea that GFE requires tostins for cvory possible 
effect, the best reply is to say that any evaluator worth 
hiring has to look for side-effects, and there's no limitation 
on where or in what form they crop up. Sn even the goal- 
based evaluator (GBE'r) has to do this allegedly impossible 
task. (And so, for that matter, does any applied scientist 
searching for the effects of a new drug -nr the scientist 
looking for unknown causes of an important effect, e.g., 
death or cancer: except he soarclies for every possible 
cause, not effect.) The GFE'r loo^s at the treatment and/or 
curricular materials, after all. and can immediately formu- 
late some hypothesis about probable effects, based on 
previous experience and knowledge of the research litera- 
ture. Often, too, the GFE'r can look at the results of quizzes 
etc., though it's desirable to do that cifJor formulating the 
hypothesis just mentioned, to avoid premature fixation on 
the variables of concern to the project. 

'*rm afraid the GFE is going to be seen as a threat by 
many producers, perhaps enough to prevent its use." It's 
true that even GBE was and is so threatening that its intro- 
duction has been prevented or rendered useless on many 
projects. But it has gradually become increasingly a require- 
ment, and the standards for it are creeping upwards. The 
same is likely to be true of GFE. Now it's important to see 
why GFE is more of a threat. Primarily this is because the 
GFE'r is less under the control of management; not only are 
the main variables no longer specified by management, but 
they may not even include those that management has been 
advertising. The reactions by management to GFE have 
really brought out the extent to which evaluation has be- 
come or has come to seem a controllable item, an unhealthy 
situation. The idea of an evaluator who won't even folk 
to you for fear of contamination can hardly be expected to 
make the producer rest easy. It's probably very important, 
psychologically, to talk to your judge, to feel you've got 
across a sense of your mission, the difficulties, etc. We 
all have some faith in ''tout comprendre c'est tout pardon- 
ner." But the evaluator isn't our judge, just the judge of 
something we've produced. Even if it's not much good, 
there's a long way to go before blame can be laid at the 
producer's door. If a producer really cares about quality 
control it won't do to insist that the prt.ject's definition of 
quality must be used. 

Methodological analogies of GFE (in other fields) 

The Intentional Fallacy. In the field of aesthetics it has 
been widely but not universally accepted that it is fallaci- 
ous for a critic to consider the intentions of thu artist in 
assessing the work of art. If the "meaning" doesn't show, 
it doesn't [or shouldn't) count. I am inclined to think this is 
a perverse view, a purist limit that goes beyond the bounds 
of sense. The titles of paintings, the locale of photogra- 
phers, program notes at the symphony, ihe period of a 
building, even the biographies of Russian novelists, "cast 
new light on" the art object itself, and are interesting in 
themselves. The fallacy is to suppose that the only legiti- 
mate framework in which to see a work of art is as an au- 
tonomous entity. Art can enlighten, it can give pleasure, it 
can communicate feeling, and so on— and there's nothing in 
there that says the background and context of the artwork 
can't contribute. It's really a case where the consumer can 
choose. One may say that assessing the nrtist legitimately 
brings in these considerations, but assessing the artwork 
O does net-but the slight attraction of this '^tidying-up" 

ERIC 



move scarcely amounts lo a compelling argument for any 
reasonable man. 

In the educational materials production situation, on the 
other hand, as in the consumer field in general, we can 
usually establish that th(* intentions of the producer are of 
negligible concern to the consumer by comparison with sat- 
isfactory performance on the criterion dimensions (e.g., 
gains in reading scores). Not only is this so, but there seems 
to be little reason why it slinuldn't he so. When the history 
of educational R&D is written (if ever historians can be 
found to stoop lo such a low-status task which happens to 
be socially valuable) then the intentions of producers will 
be of great interest. For the future producer, a study of these 
may be far more valuable than a study of the products. 

So the "intentional fallacy" is not, in my view, a fallacy 
in the area where the term was introduced— but it would be 
one in the evaluation of consumer goods. 

Motives and Morality. A tremendous tension has long ex- 
isted in philosophical ethics between those who believe 
that the morality of acts is principally determined by their 
motivation ("He mecinl well**) and those who would assess 
acts in term., of !hoir consequences alone ("Write thai on 
his gravestone; first, he should be shot"). Current pop ethics 
is on the conscience trip -the "pragmatist" is seen as the 
opposition. 

The special feature of this cas.^ is that the act involves 
the motive in a much more intimate way than the product 
involves the producer's intent. It has been argued that the 
same physical motions performed with different intentions 
are definitionally a different act; the distinctions between 
manslaughter and murder, between borrowing and theft, 
erring and lying, for example, are said to be distinctions 
between different acts. One cannot argue that a program- 
med text supposed to teach economics better than the 
competition but which actually leaches reading better (and 
economics the same) is crucially different for the consumer 
from one in which ti.e side-effect was the primary aim of 
the producer. And it is for just this reason I prefer the role 
of the GFE'r for summative evaluation. 

On the philosophical issue: I prefer lo say that neither 
exclusive position is defensible, that the issue is resoJved 
one way or the other in particular cases where the point 
of the evaluation becomes clear. 

Double*Bllnd Designs. A correspondent writes, "The so- 
called 'double-blind' medical experiment isn't blind in terms 
of goal or purpose. A treatment is being tested for its effect 
on a specific disease. The 'blind' is strictly in terms of the 
S's or E's knowledge of who is getting what treatment. Thus 
I think your use of the analogy is inappropriate." The 
analogy is not intended to be an identity. The point of the 
analogy is to remind one that medical research, until the 
scurvy study, ignored the error due to the agent and evalu- 
ator knowing thai the treatment being given *o a particular 
patient was a dummy. Not only did this affect the agent's 
behavior in giving it, but it affected the evaluator*s care 
in assessing the effects. After all, how could one seriously 
look for therapeutic results from a sugar-pill? 'Blinding" 
the assessor made the search equally careful in both cases. 
Analogously, "blinding" the educational evaluator ensures 
(to the maximum possible extent?) equal care in looking 
for effects lhat happen not to have been goals. Now it's 
true that the CFE r may make it the first order of business 
to infer the goals of the producer. In fact, that's what hap- 
pened in the second GFE study of which 1 have received 
details. (But in the medical case this is oflen possible, loo. 
In 1958 or so I spent a great deal of time refining placebo 
effect research designs; the problems of matching for the 
taste and side-effects of the experimental drug, amongst 
other difficulties, are typically not solvable.) All one can 

Evaluation Comment-Page 3 



do is to make it as hard as possibln. In particular, one can 
try to cut out cues which allow inference of intent other 
than via noticing success. It's not disastrous if the medical 
researcher infers from the rcsu/fs that treatment B must 
have been the now medication, treatment A the placebo. 
The inference may or may not be correct; it can only be 
damaging if it is made durinj^ the experiment and hence 
might influence the later procedures. But even that pos- 
sibility can usually bo handled by splitting the role of re- 
corder from that of agent. By analogy, we cannot get too 
worried about an evaluator who. seeing massive gain scores 
on an addition-of-integers test, infers that a major goal of 
the materials was to improve addition of integers. On the 
other hand, we must try to avoid having the evaluator come 
to this conclusion by reading the introduction to the mate- 
rials, because that is likely to corrupt his later perceptions. 
When the evaluator devises special instruments for asses- 
sing inventory on a parameter that has not previously been 
tested, we can isolate the role of the agent doing the testing 
from the role of the scorer, and we can arrange that the 
scorer does ot know the pretests from the posttests. or the 
experimental group's tests from the control group's tests. 

In the early GFE just mentioned, where the evaluator 
worked diligently to reconstruct the goals, he was doing 
this by observing various effects which seemed desirable. 



He concluded that these were probably intended. But the 
s^ep of inferring goals was totally unnecessary— he could 
just as well have left the matter by noting the desirable re- 
sults. Similarly, where he inferred failure (e.g., at teaching 
the inquiry approach) he could just as well have made no 
comment, or noted lack of performance in this desirable 
dimension, from which the evaluend can conclude failure. 

Finally, although it is typical of the medical situation that 
a major parameter is identified in advance, no evaluation 
of drugs today can avoid the search for side-effects, from 
the most remote area of the symptom-spectrum. Nor is this 
obligation restricted to Federal checks; the formative evalu- 
ation of drugs requires that the manufacturer run studies 
that are both double-blind and side-effect sensitive. It 
would not be difficult to run these evaluations goal-free, 
but it has little point; given only the characteristics of the 
patients to be treated, the goal of the treatment would be 
fairly obvious. In education, the situation is different—more 
like preventive medicine. 

In sum, I think there's an illuminating analogy between 
the move to double-blind methodology and the (further) 
move to GFE. The gains from double-blind were not signif- 
icant in the physical sciences— it was an innovation of great 
value to medicine. The gains from GFE are not great for med- 
icine—but it is an innovation that may pay off for education. 



SHOULD OR CAN EVALUATION BE GOAL-FREE? 

Daniel L Stufflebeam 



Evaluation is... a methodological activity which ... consists simply in the gathering and 
combining of performance data with a weighted set of goal scales to yield either com- 
parative or numerical ratings, and in the justification of (a) the data-gathering instruments, 
(b) the weightings, and (c) the selection of goals. 

Michael Scriven The methodology of evaluation. AEHA Monograph Series on Curricu/um 
Evaluation, Book 1 Chicago: Rand McNally & Company, 1967, pp. 39ff. 



lERIC 



In setting forth the above definition of evaluation, 
Michael Scriven emphasized that evaluators must evaluate 
goals. The following is a critique of his more recent position 
that evaluators should pay no attention to goals. In this 
regard, I will list and respond to four questions that I 
believe to be important in assessing the merit of goal-free 
evaluation (GFE). 

Question— Should GFE be considered as a possible alternative 
to existing models of evaluation? 

Answer — No. GFE has been proposed as one methodo- 
logical strategy that can be used to supplement others, in- 
cluding goal-based evaluation (GBE) and the evaluation of 
goals. This is consistent with Scriven's past practice of 
analyzing evaluation in order to identify and describe the 
many kinds of evaluation that evaluators need to be able to 
perform* In addition to GFE he has proposed formative^ 
summative, intrinsic, payoff, meta, fact-free (with tongue 
in cheek, I hope), and pathway evaluation. Scriven has not 
offered any one of these evaluation types, nor all of them 
collectively, as a theory or model of evaluation. Thus, we 
should consider GFE in its proper perspective as one stra- 
tegy that can be used in conjunction with others in evalua- 
tion work. 

Question^ What Is the essence of GFE? 

Answer— It is to accurately identify effects and determine 
their importance and quality. That Scriven believes this can 
best be accomplished by preventing the evaluator from 
seeing goal statements seems to me both a secondary issue 
and an empirical question. Perhaps evaluators can be 
Evaluation Comment- Page 4 



trained not to develop tunnel-vision upon seeing a set of 
goal statements but to use them as clues for identifying 
important outcome variables. The main.cGr»cern is how best 
to insure that evaluators will identify and properly judge 
actual results, whether planned or not. 

Question — How should GFE be conducted? > 

Answer — This presently is the, rub. Which variables, in- 
struments, extant data, and standards she|uld the evaluator 
use? When should he gather his data? And bow can pro- 
gram people be protected against the potentially arbitrary 
actions of an inept or unscrupulous goal-^ree evaluator, 
especially when he is employed by an exliBrnal funding 
agent that may be a bureaucracy with neither a conscience 
nor a memory? 

Presently Dr. Scriven^s response seems to be that two 
goal-free evaluators should operate independently, begin- 
ning about midway in a project and continuing to a point 
after its completion. While this doesn't guarantee good 
quality and fair evaluation, it at least provides an oppor- 
tunity to estimate the *'error term*' involved in GFE. 

The problems of gathering data seem iar from solution. 
There are thousands of potentially relevant attainment 
variables and associated measuring devices, and GFE meth- 
odology does not provide much guidance for choosing 
among them. Goal statements at least provide hypotheses 
as to what some (NOT ALL) of the variables are. It would 
seem that system analyses would be helpful, but these also 
are goal-based. 



4 



As to how In jiidno the CFK rnsults. wv ciiroiintnr ;i con- 
ceptual prnblfim. Scrivpn su;,M;('sts \hi t thoy should he com- 
pnrnci with the rnsults of prior ikhmIs nsst^ssincnts. This is 
sound advice, if prior needs fisscssmcnts were done. But, 
if needs assessment is th(j romp;jrison of \hr. rerd with nn 
idenl. and if the ideal amounts to i\ prior statement of macro 
goals, then needs take on their m(innino ns a function of the 
discrepancy between nu nclual situation and prior ^oal 
statements. Hence, needs assessments are «oal-based and 
the use of needs asessmen! data lo d(??erniine the value 
meanin^^s of GKR observations is idso ^oi«Miasr»d. In this 
respect, the methodolojijcal suj*qestion seems sound, but 
it raises a question whether GVK can be ^oal-free. Fiirlher, 
based on Scriven's 1907 df'fiiiition. f>vahiation shcutld not 
be soal-free. The essence of evaluatitui is value judRments. 
these are mach* in n»Iation to standards, and the s^oidards 
almost always are goals. 

Question^ Taken In Its essential meaning of accurately Identi- 
fying and properly judging effects, how much can GVE contribute 
wltliln a broad evaluation framework? 

Answer— A rhmI deal. 

This lype of GVl] is the essence of idrutifyiuii (nul iudiiin^^ 
nciHJs, opporfimiij(!s. and f^roh/enis lo sei^ e as a fnurirlation 
for determining gfials. It is also applicable for uiviUifym^ 
ami jiuliiin^ aJtrrnutivf^ pn-i^-n-n) stnttt^i^irs: solution stra- 
tegies need to be .isessed for their poiver with respect lo <i 
wide range of potential desirahle inipa<;ts not just those 
associated with stateil goals. Also, ibrougb a c{)mprehen- 
siv4' GFK of alternative program strategies on(» can get a 
fix jn (he trucluhj/jty of each of a range? of prohlems and 
needs, not just the ones associatf»(i with Ibe staled goals. 
CVK is further useful for id'-irt/ fying on J iiir/ging u proffrcl's 
rffvcis. Scriv(Mi is absoIuteK correct that it's unnecf»ssary 



in ideulifyiug oiitccaiies to fnc.us un the st<i!ed project objec- 
tives. This will he done directly by the go.il-based (»valua- 
lors. and they probably wori't havcj time to search out sidc!- 
effects. 

On the other sid(? of the ledger. CA K will not suffice for 
nie(»ting accountability recpiiremenls. Sp ..iscjrs pay money 
so lhal cr?rtain priority needs (gonls. if you willj cnn b(? met. 
These needs must h(> evalualiMl. and thos(» responsihle for 
meeting them must be judged in terms of their attempts and 
Ihiur achievements and failures. In some (:as(»s it is nppro- 
pri.ile to penaM/.e n\u' for failing to ])roduc(! what was 
neechu] and what he agreed to produce, especially if the 
evaluation revealed that th.> resncuisible agent did not try 
to live up lo his agreement but instead worked on something 
mor(» satisfying lo him. Such det(?rminations require the 
use of OHF. although this does not diminish the desirahilitv 
ofHFE. 

Within this hrief piec(» I have coninu'nied on Michael 
Scriven's Cll] nu^lhodologicnl contribution. It fits in with 
his pattern of analyzing various methodological aspects of 
e't aJuatiori. C\ K is not an allernati\ e nio(h*l of evaluation; 
ralluT it is (jn(» e\'aluation strate<;y. The essence of th(» 
strategy shouhl not be t(j prev(Mit (ivahiators from seeing 
goal statements. Ixit lo insure th.'.t all relevant eff(»cts will 
be accurately ideutifi(>d and properly judged. Conceptually, 
hased an Scriven's own definiticni and arguments, it is 
(piestionable thri GVh f;jin or shouhl be goal-free. The 
strategy is potentially useful, but far from op<»rational and 
replicafde. fif^Muse of its prtmiise. I ludieve that Scriven 
ard others should further develop it, test it. and report hack 
to the profession on the effects of CFK, whatever they 
turn out to be. 



WIDER CONTEXT GOALS AND GOAL-BASED EVALUATORS 

Marvin C. Alkin 



In this issue of Fveluntion Omnnent. Scriven makes some 
interesting and important points in defense of wliat he calls 
the "goal-free evaluntor" the GFH. This t(!rm, C'.FI-:, is not 
to be taken literally. The CiFH does rec:ogni/.(* goals (and not 
just idiosyncratic ones), hut they are to b(» wider-context 
goals rather than the specific ohj(?e(i\'es* of a program. 
(IISOE goals are mentioned as an example.] 

In addition to this broader frame of reference the? GFE 
is to be characterized by a scruj)ulous conf:ern for objec- 
tivity. Not only should he n»fuse to read th(» program objec- 
tives to avoid contamination hy the "rhetoric of intent" but 
he should even decline lo talk lo the proj(»ct director. 

Insofar as GFEs bring perspective, ohjectivity. and inde- 
pendence \o evalunlions they are indeed ''a good thing." 
Manifestly, however, they are not one of the best things in 
life since they are not free. Evaluation costs money; it 
removes money from program management and implemen- 
tation fnrds. Thus, before programs slart hiring GFEs we 
need to discuss what roles are to be played by an internal 
evaluator (whom Scriven always assumes to exist) and an 
external evaluator [including the GFFJ. How does the pres- 
ence of one affect the activities and responsibilities of the 
other and to what purpose is each employed? 

•The GFE should perhaps he called an OFF (objective-free 
evaluator) but the unfortunate auditory association with OAF 
might lead one to think of him as scmconc who just sits 
around with no particular purpose in mind. 

ERIC 



First, let me point out that in understanding the nature 
of the evaluation to he performed, the "internal/external" 
distinction is not nearly as critical as the designation of the 
decision context to be served by each evaluator. That is, 
the ultimate purpose of an e\'aluation is to prov'de informa- 
tion upon which present or potential decisions are to be 
made and it is this crucial factor that distinguishes evalua- 
tion from research. The nature of the evaluation that will 
be performed, framed as it is by a particular decision con- 
text,, will be dependent upon such factors as who hired the 
evaluator, who receives the evaluation reports, and the 
nature of Ihti evaluation decision that is to be made (forma- 
tive, refunding, adoption, etc.). Thus, when Scriven talks 
about an ''internal evaluator" I presume that he is referring 
to an evaluator hired by the project director primarily to 
provide formative information for program modification 
purposes and whose reports will be directed primarily to- 
ward the project director (and perhaps secondarily to the 
sponsoring agency). In addition to this internal evaluator 
there should perhaps be an evaluator hired by the Super- 
intendent of Schools to report on the project; perhaps the 
sponsoring agency should also hire an external GBE to 
report lo them. There are many decision contexts requiring 
evaluation information and it is necessary to establish 
priorities on these various evaluation requirements. 

By '*goal-free" Scriven simply means that the evaluator 
is free to choose a wider context of goals. By his description 

Evaluation Comment— Poge 5 



hn implins thnl a ^^oal-frnn c'vahiation is aJivriys free of the 
goals of the specific prcgrnm and soinotiinos free of the 
goals of the program sponsor. In reality, then, goal-free 
nvaluation is not really goal-freo at all bill is simply directod 
at a different and usiinlly widnr decision audience. The 
typical goal-free evahiator must surely think (especially if 
he r<»jer;ls the goals of the sponsoring agency) that his 
evaluation will r::tnnd at least to the level of "national 
policy formnlators.' The question is whether this lecision 
audience is of the highest priority in our present concerns 
for improving evaluation. 

The high priority that Srriven attaches to the goal-free 
ovahiator seems to be based primarily upon his experience 
in considering the c^valualion of packaged instructional 
products desiguful to be used widely and in a variety of 
contexts. Scn'ven's major examples come from product 
evaluations performed at only a limited number of centers, 
laboratorie.s, and other organizations that produce validated 
instructional materials. F.ach of these organizations has an 
internal evaliiation staff. Dut the materials they are con- 
cerned with represent merely the tip of a giant iceberg of 
instructional products most of which undergo little or no 
evaluation- neither formative nor summative nor goal- 
based of any kind. Moreover, when one considers problems 
related to the evaluation of in.strnrtional prof^rnms (such as 
the Title programs-I, HI, VII, VIII) and the evaluation of 
fenciiors (such as that mandated by the Stull Rill in Cali- 
fornia), then the iceberg of instructional product evaluation 
pales in importance compared to the Arctic sea of evalua- 
tion problems. Thus, while it is difficult to dispute Scriven's 
point that there is a role for a person called a goal-free 
evahiator, one can certainly cpiestion bis judgment as to 
the areas of greatest "demonstrated need" in evaluation at 
this time. And if one can question a goal-free evaluator on 
how well he interprets "demonstrated need," what else is 
left? 



And so, what are the alternatives to a goal-free evalua- 
tor? Scriven comes to see the need for goal-free evaluators 
because he questions the goals (or objectives) specified by 
project personnel as potentially not being expressions of 
"demonstrated need" or as being ambiguously stated. If 
this is the case, then why must the evaluator wait for the 
program to become fully implemented before providing 
evaluative feedback on the rightness of goals. In part, this 
lack of foresight attributed to a goal-based evaluator (GBE) 
by Scriven is related to his rather limited definition of the 
role of the evaluator, Scriven thinks of the evaluator as 
participating in formative and summative evaluation, in 
essence limiting the evaluative engagement to the period 
following the adoption of the educational program. This 
oversight is corrected in the evaluation model of the Center 
for the Study of Evaluation in which we conceive of the 
evaluative responsibility beginning with "needs assess- 
ment". In the needs assessment stage the evaluator assists 
in providing explicit data as to the relevance of stated goals 
to real and demonstrated needs. Scriven's goal-free evalua- 
tion is in essence a retrospective (and non-explicit) needs 
assessment. This would he all right, but for the fact that 
performing this function retrospectively raises the cost 
enormously, not only of the evaluation hut of a program 
that may have gone astray and which could have been 
brought hack on course at an earlier time. 

If the goals that are alleged are not the "real" ones or the 
"right" ones then let the GBE establish a procedure, an 
explicit procedure, for determining the goaIs< If mere rhe- 
toric constituted the supporting evidence then let the GBE 
do a better job in ossessing the goals. Condemning the GBE 
procedure because of inadequacies in its execution does 
not solve the problem. Performing a better job of GBE does 
offer some hope. 



RESULTS RATHER THAN RHETORIC 

W. James Popham 



.ERIC 



Whether Michael Scriven ever uttered the phrase "re- 
sults rather than rhetoric" I am not certain. I came away 
from a conference many months ago in Colorado thinking 
that he had. It was there that Michael was testing an early 
conception of his guril free evaluation position. If he didn't 
use that particular phrase, be probably won't be too dis- 
pleased if I attribute it to him. After all. not only is the 
phrase alluringly alliterative, but it conveys a commitment 
to empirical evidence and a dismissal of mere word wiz- 
ardry. And Michael Scriven has a strong allegiance to em- 
pirical methods and a special flair for walloping word 
wizards. I don't think he'd mind the attribution at alb 

But beyond questions of its ancestry, the idea of results 
rather than rehetoric, as embodied in Scriven's goal-free 
evaluation writings, provides a useful caution to those 
educators who have recently become so enamored of in- 
structional objectives that they think the mere act of articu- 
lating their goals precisely is not only the beginning but 
the end of the instructional ball game. And as you can 
learn from any baseball pitcher who has set out in the first 
inning to pitch a shutout, the game's final score is the thing 
that counts, not good intentions. Goal-based evaluation has 
offered educators a way of counteracting the heavy em- 
phasis on Instructional process which has been so fashion- 
able in our country for years. GBE made it easier to 
Q 'escribe intended instructional effects, then see if they 
Valuation Comment— Page 6 



were actually produced. But, as Professor Scriven's goal- 
free evaluation paper reminds us, GBE has often led to a 
tunneling of vision so that important results of instruction 
were overlooked. If GFE does nothing more than remind 
educators to appraise an educational undertaking on the 
basis of all its important effects, not just those which were 
described beforehand (even in flawlessly fashioned behavi- 
oral objectives), then GFE will have been a useful 
contribution. 

But while the logic of Scriven's GFE stance is com- 
mendable, there are a few implementation operations which 
currently vex me. It's so early in the GFE game that Profes- 
sor Scriven hasn^t had lime to wrestle with all of them. He 
undoubtedly will in time. 

First, there was a clear implication iri several of his early 
essays on GFE that the GFE'r could derive special raptures 
from spotting the educational catastrophes that a goal- 
blinded evaluator would not discern. Scriven spent a fair 
amount of time describing how the GFE'r would "set 
snares" to pick up a program's effects. While discovering 
aJJ important effects are the proper province of the GFE'r, 
one had the distinct impression that his real kicks came 
from isolating an undiagnosed malignancy. We'll have to 
see whether goal-free evaluators can be trained so that they 
develop a balanced search for the beneficial as well as the 
harmful results of an instructional program. 



Second, there is a practical difficulty which the GFE*r 
will have trouble resolving, particularly in a formative con- 
text. If it is true, as Scriven contends, thnt actual effects 
must be evaluated against "a profile of ricmonstrnteri 
needs," then clearly the GFE'r will either have to conduct 
some sort of an independent noods assessment or must rely 
on an existing effort to demonstrate needs. Relying on an 
existing needs assessment operation, particularly if carried 
out by the staff of the project being evaluated, carries with 
it the same deficits as GBE; that is, there may be subtle 
project staff biases operating which distort the validity of 
the assessment. But conducting an independent needs 
assessment is costly business and may not be considered 
cost-effective by the project's management. These problems 
may be more easily resolved in the summative context, be- 
cause the stakes are often perceived as higher and a sum- 
mative evaluator may therefore more readily be able to 
demand the resources needed to secure an unbiased needs 
profile. But for a formative evnluator, I think this is a sticky 
problem. We want to foster as mu^h independence for our 
GFE'r as possible, yet a totally independent needs assess- 
ment seems uneconomical. 

A third problem stems from the degree to which a GFE'r 
can remain insulated from the instructional designer's goal 
preterences when it comes to devising the measures re- 
quired to assess program impact on learners. In the abstract 
it is easy for a GFE'r to turn off the instructional designer 
who is about to spout goal talk. In constructing tests, obser- 
vation scales, unobtrusive measures, etc., the GFE'r needs 
to have some kind of clues regarding what results the in- 



struction is apt to yield. But as the rnquisitn infornncns are 
made from instructional procedures, materials, etc., there 
will be a strong likelihood that the project goals will insinu- 
ate themselves in the perceptions of the GFE'r. 1 suspect, 
therefore, that the possibility of keeping GFE completely 
uncontaminated by goal preferences is uurenlislic. We 
must make it as gonl-free as we can. 

A final problem with GFE is that many educators- ter- 
rorized by the possible repercussions of goal-based evalua- 
tion—will use GFE as a philosophor-iipproved excuse for 
chucking out goals altogether. Yet Scriven makes it very 
clear that goals are required for planning, production, and 
internal evaluation. We must guard against those who will 
try to use GFE as an intellectually respectable cover for not 
thinking rigorously about their educational intentions. 

Goal-free evaluation is destined to become very popular 
among educational folk. It is new. It was sired hy an emi- 
nent academic philosopher who. all blessings abound, 
speaks with an educated British accent. 1 can see future 
evaluators clamoring for specially desi.^ned GFE blinders 
to protect them from the taint of project goals. Short 
courses in snare-setting will hi) conducttul jointly hy univer- 
sity departments of education and state game commissions. 
GFE will be IN. 

But, because 1 have been ptirsuaded hy an <uuinenl 
academic philosopher who speaks with an educated British 
accent, I'll have to wait until all this GFE stuff has been 
tried out in a good number of real educational evaluations. 
You see, I've recently become somewhat committed to 
results rather than rhetoric. 



GOAL-FULL EVALUATION 

George F. Kneller 



.ERIC 



Professor Scriven advocates goal-free evaluation as a 
remedy for certain weaknesses in contemporary research 
design. The remedy, however, is unnecessary, since, as I 
shall point out, these weaknesses can be corrected more 
efficiently by modifying either the design itself, or the 
training of evaluators, or both. 

Scriven's most substantial argument in favor of goal-free 
evaluation is that the more an evaluator concerns himself 
with the goals of a project, the less likely he is to notice 
the project's side-effects. This tendency, however, may be 
corrected in two ways without resort to GFE: first, by train- 
ing evaluators to observe both goals (and outcomes) and 
side-effects; second, by the researcher's specifying as many 
likely side-effects as possible within the original research 
design. Thus the researcher himself gathers many of the 
relevant data while conducting his own project. 

Scriven also argues that the use of GFE makes it harder 
for an evaluator to persuade himself that the goals of a 
project have been achieved simply because they have been 
set. But bias in favor of goals is only one of many biases to 
which evaluators are subject, and little is gained by seeking 
to eliminate this form of subjectivity while leaving other 
forms untouched. The wisest course is not to rely on GFE 
to eliminate one form of subjectivity but to train evaluators 
in advance to be objective judges in as many respects as 
possible. 

Scriven's other arguments in favor of GFE carry little 
weight: 

• He maintains that research projects often are designed 
to attain grandiose goals which distract the evaluator's 



attention from the project's actual achievements. I reply 
that (a) evaluators should be trained to spot and to criti- 
cize grandiose goals, and (b) researchers should be 
trained to set realistic goals. 

• Similarly Scriven asserts that the alleged goals of a 
project often differ from the real ones. Once again, how- 
ever, proper training should (a) correct this sort of mis- 
understanding in the researcher, and (b) improve the 
evaluator's ability to recognize the discrepancy when it 
occurs. 

• Scriven also claims that the goals of many research 
designs are too vague. Indeed, they may be. But the way 
to eliminate the fault is not to introduce GFE after the 
event but to educate researchers to draw up their designs 
more carefully at the outset. 

• Scriven calls for GFE on the grounds that projects 
often fail to achieve their goals. But unless we take these 
goals into account, we shall never know which projects 
have succeeded in their aims and which have not. 

The frailties which Scriven correctly criticizes in re- 
search designers may also be found in evaluators, goal-free 
or otherwise. These are human frailties, and they may come 
into play anywhere in the course of a project from its pre- 
liminary drafting to its completion. It is not enough, there- 
fore, to provide one particular safeguard by introducing 
GFE after the project is finished. Instead, safeguards 
should be built into the design at many points. 

Also, Scriven makes no provision for the defense of re- 
searchers against the bias of his breed of evaluators. In my 
view, researchers are fully entitled to object to evaluators 

Evaluation Comment— Page 7 



wlio iwr nol cruiricriKMl tfi find out w li.il thi^ jjoals ;in* (>( the 
projects liu'V iin^ (»\,iiniiiir)<^ 1 iiiii not ;ir,<.Miin^.' loc ;i li.ihiiicc 
nf pr)\v(*r hcfwccn rrscaichcrs and <'VMlt(.il(jr.s, but 1 .ini 
s.iyin<4 thiit cv.ihiiitors, too, liccdiiK^ irr,iti()n;il iind 

inHiu)(lt'rnt(\ 

Sciivcirs mcl.'ipliors iiiid Hiri;d(»<.^ii's arc hri.i^lit and aiuns- 
in.U. Inil they dn nol m.iki' lor ti'jlit lo.'.^iii.d ,'ir.L^iiiii(iit. Mort^- 
ovcr. sonu! of lln'iii .irt^ ill t nn( i^fl. The '.lo.d-l ret* (n aluii- 
lor. he Siiys. oner li,i\in<^ Irainfd to he fnM' of a s('< uiity 
lilankct id" ;4nals. '*likn riilin'^ a liicv f Ir or sui in mi n.^; without 
(h(» ald.s one nsrs af first, |r\[)r/irnf:rsj a rrniarkahlr srjisr 
()( rrcrdoni. of lihrration." hi Far I, (a) aid.-; arc not in the 
s.inic class as J-I^als. arit! (li) s«m an tt\' [)lar3kcts um v s.i\'c the 
lives of lh<;sc ulm frci so (rcc lliaf fhcy oulswim (lose 
control of) thcniscU f's. 



If 1 had to niak(* ii cIhmcc. I would r(?j(M:l CVV., nol ordy in 
( onsctpiciicc of th(^ conidcr-ar,«-;iinu'nts I draw above but 
Iso because of rny .i^cstallist tcnib^ncy lo Sf?c thing's in more 
or less complete patterns. In-nce to tak(? tln^ eonls of (?nt(;r- 
prises into consideridion. In any ciise, I do not S(m^ thn issue; 
as lh(M)re(icai but as one afipealin.t,' (n trcsff?, about wliicb 
tluM'c can b(» no dispute. Ou(! simply mak(\s a choicii and, 
■{' cidled up(Hi to justify the r;b(jice, can offer only persorial 
)pinions. 

Scri\eu's (^ssay is not without nuc^^^i^ts of wisdom. For 
e\ainf))f'. be [)oiiHs rjul: "Hut in no case is it })roper to nsr' 
anyone's <^()als as the standard uidess llioy be sfioivn 
to bi' the ,ip[)roiu iate ones am! niorally d- .ibhi." This 
r< mark sfmuld b(^ taken serif usly nof ordy i»y rcsearf:hers 
and e\alu,itors but by people in ev(M"\' \\'alk of life. 



EVALUATION COMMENT. THE JOURNAL OF EDUCA- 
TIONAL EVAWATION, is published by the Center for the 
Study of Evaluation, University of California, Los Angeles. 
CSE, which is one of eight educational research and devel- 
opment centers in the country » was founded in June, 1966, by 
the UiS. Office of Education under the Cooperative Research 
Act. It ia the only federally funded research and develop- 
ment center working exclusivniy on problems in educational 
evaluation. 

P^ach issue of the journal *wiH dtscussb topics in educa- 
tional evaluation by presenting articles on evaluation theory, 
procedures, methodologies, or practicei7)For each issue of 
the journal) CSE will request contributions on a specific 
topic in evaluation from recognized experts in the field; 



unsolicited manuscripts will not be accepted. 

Evaluation Comment has a current circulation of over 
8,000 readers. Each scholar, researcher, or practitioner on 
our mailing list receives a free copy of the journal. Where 
additional copies are needed readers are encouraged to re- 
produce the Comment themselves. To be placed on our 
mailing list, please write to: 

Jamefi Burry, Managing Editor 
Evoluation Commert 
Center for the Study of Evaluation 
145 Moore Hall 

University of California, Los Angeles 
Los Angeles, California 90024 



University fjf Califtirnir 




Cent or for the Study of Evaliuition 
145 Moorn Hall 
405 IHlj^ard Avcniiu 
Lt)S Anfiolos, Ctilifornin 90024 



NON PROFH ORG 
U S. POSTAGE 

PAID 
LOS ANGELES, CALIF. 
PERMIT NO 12378 



